-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix speed problem with top_k>1
on CPU in edit tree lemmatizer
#12017
Fix speed problem with top_k>1
on CPU in edit tree lemmatizer
#12017
Conversation
@explosion-bot please test_gpu |
URL: https://buildkite.com/explosion-ai/spacy-gpu-test-suite/builds/120 |
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
I don't completely understand this. For Getting rid of the sort makes sense for small values of k. Though, it does lead to the degenerate O(n^2) case when someone says 'lets try all edit trees'. So, I am very interested in the empirical results. Maybe it's worth in the end using this approach for small k's and sort for large k's. |
An important point is that even if you say "let's try all edit trees" it only iterates through them until it comes across a tree that can be applied to the raw token text. And in any normal model a good proportion of trees are e.g. 'do nothing' or 'add an |
top_k>1
in edit tree lemmatizertop_k>1
on CPU in edit tree lemmatizer
I think it’s still good to have bounds on the complexity of it is easy to do so. These things don’t happen until they do. (E.g. we had a degenerate case in parser feature extraction that wasn’t noticed until a user used it in a way that triggered quadratic complexity). It’s only a simple if statement: use sort when k is higher than a certain value. Two lines of code for avoiding quadratic complexity seems like an easily trade-off. |
Here are the figures for
To check the patterns were reproducible for a language with very different morphology, I performed a couple of the experiments with
This shows it makes sense to retain the pre-existing solution for |
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I also did some experiments this morning, and works very nicely.
One small naming nitpick.
Note: merge after 3.5.0 is tagged.
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Description
This PR is one of a series of three (#11583; #11959; #12017) that increase the accuracy of the edit-tree lemmatizer: the cumulative accuracy improvement is shown in the final section below for each language. The mean morphologizer accuracy increased from 95.7% to 96.2% (+0.5%); the mean lemmatizer accuracy increased from 94.3% to 96.0% (+1.7%). The figures are particularly encouraging for the weaker models: while the pre-existing code yields models with accuracies in the mid-80s% for a few languages, with the cumulative changes at least 91% is achieved for all languages. Unlike the other two PRs in the series, this PR does not introduce new functionality, but rather improves speed to make it feasible to use existing functionality.
The edit-tree lemmatizer has a hyperparameter
top_k
that specifies the number of alternative predicted trees to consider for each token: if the first predicted tree is not applicable to the raw token text, the second predicted tree is considered and so on up to the value oftop_k
.Although accuracies are normally higher if alternative predictions can be considered, the current standard published models all specify
top_k = 1
. This is because the pre-existing code is several times slower when executed on a CPU for any higher value oftop_k
owing to an expensive NumPy sort that is used to order the predictions for each token (the sort is avoided in the pre-existing code iftop_k == 1
).This PR retains the existing approach if
top_k==1
, but introduces a procedural approach iftop_k>1
. The procedural approach avoids the NumPy sort and also has the important advantage that time is only spent processing alternative predictions if earlier predictions were not applicable. Because with a well-trained model the first predictions are typically applicable to most tokens, this largely removes the performance hit wheretop_k > 1
and should allow us to choose values for standard models in the future based purely on accuracy requirements.When using the edit-tree lemmatizer for its original intended purpose, it is unlikely that values of
top_k
above about 10 would ever be useful. However, it is conceivable that the code might be used for some other purpose where much higher values oftop_k
are required. For values oftop_k
above about 20, the pre-existing approach is more efficient than the new approach. To take the various scenarios into account, the new code checks the value oftop_k
and selects one of three strategies depending on whether it is1
;2<n<=20
; or>20
.The following speed figures were measured training
pl_core_news_lg
. The relevant improvement is for2<n<=20
and especially on CPU, although selecting the optimal strategy once for each batch of documents rather than testing to see whethertop_k
is greater than 1 or not each time an individual document is processed also has a small but consistent positive impact on speed for other scenarios:top_k
The cumulative impact of improvements to the edit-tree lemmatizer
Changes
top_k = 5
rather thantop_k = 1
, which was made feasible by this PR;tok2vec
(Add new features and options to tok2vec to improve accuracy #11583)tok2vec
and the components it feeds into, and setting a global dropout probability of 0.2 (Add new features and options to tok2vec to improve accuracy #11583)Approaches that were investigated and abandoned
top_k = 5
, this information could be used to filter out incorrect predictions; however, the positive effect on precision was accompanied by a more or less equal negative effect on recall.Accuracy
The mean morphologizer accuracy increased from 95.7% to 96.2% (+0.5%); the mean lemmatizer accuracy increased from 94.3% to 96.0% (+1.7%):
There is one transformer-based model,
de_dep_news_trf
, that uses the edit-tree lemmatizer. Three of the five changes listed above are relevant to transformer models; applying them increased the accuracy from 98.7% to 98.9% (+0.2%).Speed
With the CNN models, lemmatizer and morphologizer inference with
pl_core_news_lg
were measured as being 12.2% slower on CPU and 42.4% slower on GPU with the five changes listed above included than without them.With
de_dep_news_trf
, which was only run on GPU, the speed penalty was 7.3%.Types of change
Speed enhancement
Checklist